Detecting Categories in News Video Using Acoustic, Speech, and Image Features

نویسندگان

  • Slav Petrov
  • Arlo Faria
  • Pascal Michaillat
  • Alexander Berg
  • Andreas Stolcke
  • Dan Klein
  • Jitendra Malik
چکیده

This work describes systems for detecting semantic categories present in news video. The multimedia data was processed in three ways: the audio signal was converted to a sequence of acoustic features, automatic speech recognition provided a word-level transcription, and image features were computed for selected frames of the video signal. Primary acoustic, speech, and vision systems were trained to discriminate instances of the categories. Higher-level systems exploited correlations among the categories, incorporated sequential context, and combined the joint evidence from the three information sources. We present experimental results from the TREC video retrieval evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...

متن کامل

Removing car shadows in video images using entropy and Euclidean distance features

Detecting car motion in video frames is one of the key subjects in computer vision society. In recent years, different approaches have been proposed to address this issue. One of the main challenges of developed image processing systems for car detection is their shadows. Car shadows change the appearance of them in a way that they might seem stitched to other neighboring cars. This study aims ...

متن کامل

Recent Advances in Video Content Analysis: From Visual Features to Semantic Video Segments

This paper addresses the problem of automatically partitioning a video into semantic segments using visual low-level features only. Semantic segments may be understood as building content blocks of a video with a clear sequential content structure. Examples are reports in a news program, episodes in a movie, scenes of a situation comedy or topic segments of a documentary. In some video genres l...

متن کامل

Visual speech segmentation and speaker recognition for transcription of TV news

This paper is about a method for visual segmentation of TV news. The TV news shows are segmented according to the visual stream from the video TV recordings in this method. Human faces are found in the single visual segments with the help of the fast algorithm for face detection. The found faces are compared with the visual GMMs, that have been trained from the video picture of the single broad...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006